Search CORE

4 research outputs found

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Author: Benini L.
Gurkaynak F.K.
Schaffner M.
Schuiki F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Variable delay ripple carry adder with carry chain interrupt detection

Author: Burg A.
Fichtner W.
Gurkaynak F.K.
Kaeslin H.
Publication venue: Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa
Publication date: 06/06/2011
Field of study

A statistical approach for the area efficient implementation of fast wide operand adders using early termination detection is described and analyzed. It is shown that high throughput can be achieved based on area- and routing-efficient ripple-carry adders with only marginal overhead. They share a low AT-product with Brent-Kung adders but provide designers with totally different area/delay tradeoffs. The circuit does not require full-custom design and fits well into both self-timed and synchronous designs

Infoscience - École polytechnique fédérale de Lausanne

Efficient ASIC implementation of a real-time depth mapping stereo vision system

Author: Burg A.
Felber N.
Fichtner W.
Gurkaynak F.K.
Isler O.
Kaeslin H.
Kuhn M.
Moser S.
Publication venue: Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa
Publication date: 06/06/2011
Field of study

This paper presents a fast and area-efficient implementation of a real-time stereo vision algorithm for spatial depth mapping. The design combines two well-known area-based approaches to stereo thatching and includes an occlusion detection method. Hardware efficiency is achieved by storing only partial images on-chip, avoiding full-sized frame buffers. A low-latency dataflow-oriented structure makes it possible to process 256 x 192 pixel Input streams with a rate In excess of 50 frames per second, amounting to more than 54 million pixel x disparity measurements per second (PDS) (for a 25-pixel disparity range), or roughly 18 GOPS. The design has been Integrated In a 0.25 mu m standard CMOS technology and occupies an area of less than 3 mm(2)

Infoscience - École polytechnique fédérale de Lausanne

Real-time high-sensitivity impedance measurement interface for tethered BLM biosensor arrays

Author: Benini L.
De Micheli G.
Guiducci C.
Gurkaynak F.K.
Leblebici Y.
Temiz Y.
Terrettaz S.
Vogel H.
Publication venue: place:NEW YORK
Publication date: 01/01/2008
Field of study

This paper presents a switched-capacitor (SC) current integrator circuit for impedance measurement of tethered bilayer lipid membrane (tBLM) biosensors. The circuit comprises a small number of high performance components enabling enhanced experimental flexibility and reliability. The sensitivity is improved significantly by suppressing the output offset through pseudo-differential operation, using R-C components for the reference impedance. The sensing and reference electrodes are excited with low-amplitude differential voltage pulses and the current response to membrane resistance (RM) change of the tBLM biosensor is converted to voltage by a precision, low-noise SC integrator available as a single-package IC. Tests with both electrical models and actual biosensors demonstrated that the proposed circuit operates with high sensitivity and can be used in single chip versions for low-cost and high-sensitive tBLM biosensor arrays, featuring multiple electrode sites

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna